Mutual Information and Diverse Decoding Improve Neural Machine Translation

نویسندگان

  • Jiwei Li
  • Daniel Jurafsky
چکیده

Sequence-to-sequence neural translation models learn semantic and syntactic relations between sentence pairs by optimizing the likelihood of the target given the source, i.e., p(y|x), an objective that ignores other potentially useful sources of information. We introduce an alternative objective function for neural MT that maximizes the mutual information between the source and target sentences, modeling the bi-directional dependency of sources and targets. We implement the model with a simple re-ranking method, and also introduce a decoding algorithm that increases diversity in the Nbest list produced by the first pass. Applied to the WMT German/English and French/English tasks, both mechanisms offer a consistent performance boost on both standard LSTM and attention-based neural MT architectures. The result is the best published performance for a single (non-ensemble) neural MT system, as well as the potential application of our diverse decoding algorithm to other NLP re-ranking tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Machine Translation Improvement based on Phrase Selection

This paper describes the importance of introducing a phrase-based language model in the process of machine translation. In fact, nowadays SMT are based on phrases for translation but their language models are based on classical ngrams. In this paper we introduce a phrase-based language model (PBLM) in the decoding process to try to match the phrases of a translation table with those predicted b...

متن کامل

A Simple, Fast Diverse Decoding Algorithm for Neural Generation

We propose a simple, fast decoding algorithm that fosters diversity in neural generation. The algorithm modifies the standard beam search algorithm by penalizing hypotheses that are siblings—expansions of the same parent node in the search—thus favoring including hypotheses from diverse parents. We evaluate the model on three neural generation tasks: dialogue response generation, abstractive su...

متن کامل

Single-Queue Decoding for Neural Machine Translation

Neural machine translation models rely on the beam search algorithm for decoding. In practice, we found that the quality of hypotheses in the search space is negatively affected owing to the fixed beam size. To mitigate this problem, we store all hypotheses in a single priority queue and use a universal score function for hypothesis selection. The proposed algorithm is more flexible as the disc...

متن کامل

Dynamic Oracle for Neural Machine Translation in Decoding Phase

The past several years have witnessed the rapid progress of end-to-end Neural Machine Translation (NMT). However, there exists discrepancy between training and inference in NMT when decoding, which may lead to serious problems since the model might be in a part of the state space it has never seen during training. To address the issue, Scheduled Sampling has been proposed. However, there are ce...

متن کامل

Improving Neural Machine Translation through Phrase-based Forced Decoding

Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1601.00372  شماره 

صفحات  -

تاریخ انتشار 2016